Data Loading

Using PowerQuery, I imported the data from a CSV file. After analyzing the column descriptions, I confirmed that none of them contained errors or missing values. However, I quickly noticed that the dataset includes missing days — some dates are absent from the data. Interestingly, in both datasets (alr_d, ing_d), the missing days are exactly the same.

Alior company data
Alior company data
Dane spółki ING
Dane spółki ING

Missing Data Analysis

I checked what percentage of the entire dataset is made up of missing days. If it were less than 5%, they could have been omitted. However, in our case, the missing data accounts for as much as 32%, so ignoring them could distort the entire distribution analysis.

Generated set of all dates in the analyzed period
Generated set of all dates in the analyzed period
Final dataset in Excel
Final dataset in Excel

Data Preparation

The table with the required data was saved to a CSV file. Then, in VSCode, I replaced commas , with dots . (06_change_comma_to_dot), because Excel in my region saves numbers with a comma as the decimal separator.

Separator replacement
Separator replacement

Linear Interpolation of Missing Values

The next step is to perform linear interpolation of the missing data.

I load the data into R and split it into the appropriate columns:

data <- read.csv(
  file = "data.csv",
  header = TRUE,
  sep = ";"
)
data
data_dates <- data$date
data_alior <- data$close_alr
data_ing <- data$close_ing

I use the approx function in R for linear interpolation.

The approx function works as follows:

Below is my custom function that fills in the missing values:

linear_interpolation <- function(data) {
  na_indexes <- c()
  no_na_indexes <- c()
  no_na_values <- c()
  for (i in 1:length(data)) {
    if (is.na(data[i])) {
      na_indexes <- c(na_indexes, i)
    } else {
      no_na_indexes <- c(no_na_indexes, i)
      no_na_values <- c(no_na_values, data[i])
    }
  }
  
  interpolated_data <- approx(
    x = no_na_indexes,
    y = no_na_values,
    xout = na_indexes,
    rule = 2
  )
  
  response <- data
  response[na_indexes] <- interpolated_data$y
  
  return(response)
}

I perform the interpolation separately for each column:

data_alior_int <- linear_interpolation(data_alior)
data_ing_int <- linear_interpolation(data_ing)

Saving the interpolated data

I save the interpolated data into a CSV file so that it can be used for further analysis:

new_data <- data.frame(
  date = data_dates,
  closes_alior = data_alior_int,
  closes_ing = data_ing_int
)

write.csv(
  x = new_data,
  file = "int_data.csv",
  row.names = FALSE
)
new_data

Thanks to this step, the dataset is now complete, and the missing values have been filled using linear interpolation.
This ensures that the data is ready for further analysis of the closing price distributions.

LS0tCnRpdGxlOiAiQ2hhcHRlciAyOiBEYXRhIFByZXBhcmF0aW9uIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIyBEYXRhIExvYWRpbmcKClVzaW5nICoqUG93ZXJRdWVyeSoqLCBJIGltcG9ydGVkIHRoZSBkYXRhIGZyb20gYSBDU1YgZmlsZS4gQWZ0ZXIgYW5hbHl6aW5nIHRoZSBjb2x1bW4gZGVzY3JpcHRpb25zLCBJIGNvbmZpcm1lZCB0aGF0IG5vbmUgb2YgdGhlbSBjb250YWluZWQgZXJyb3JzIG9yIG1pc3NpbmcgdmFsdWVzLiBIb3dldmVyLCBJIHF1aWNrbHkgbm90aWNlZCB0aGF0IHRoZSBkYXRhc2V0IGluY2x1ZGVzICoqbWlzc2luZyBkYXlzKiog4oCUIHNvbWUgZGF0ZXMgYXJlIGFic2VudCBmcm9tIHRoZSBkYXRhLiBJbnRlcmVzdGluZ2x5LCBpbiBib3RoIGRhdGFzZXRzIChgYWxyX2RgLCBgaW5nX2RgKSwgdGhlIG1pc3NpbmcgZGF5cyBhcmUgZXhhY3RseSB0aGUgc2FtZS4KCiFbKkFsaW9yIGNvbXBhbnkgZGF0YSpdKC4uL3NjcmVlbnNob3RzLzAxX2Fsci5wbmcpICAKCgohWypEYW5lIHNww7PFgmtpIElORypdKC4uL3NjcmVlbnNob3RzLzAyX2luZy5wbmcpCgojIyBNaXNzaW5nIERhdGEgQW5hbHlzaXMKCkkgY2hlY2tlZCB3aGF0IHBlcmNlbnRhZ2Ugb2YgdGhlIGVudGlyZSBkYXRhc2V0IGlzIG1hZGUgdXAgb2YgbWlzc2luZyBkYXlzLiBJZiBpdCB3ZXJlIGxlc3MgdGhhbiA1JSwgdGhleSBjb3VsZCBoYXZlIGJlZW4gb21pdHRlZC4gSG93ZXZlciwgaW4gb3VyIGNhc2UsIHRoZSBtaXNzaW5nIGRhdGEgYWNjb3VudHMgZm9yIGFzIG11Y2ggYXMgKiozMiUqKiwgc28gaWdub3JpbmcgdGhlbSBjb3VsZCBkaXN0b3J0IHRoZSBlbnRpcmUgZGlzdHJpYnV0aW9uIGFuYWx5c2lzLgoKIVsqR2VuZXJhdGVkIHNldCBvZiBhbGwgZGF0ZXMgaW4gdGhlIGFuYWx5emVkIHBlcmlvZCpdKC4uL3NjcmVlbnNob3RzLzA0X2ltcG9ydGVkX2RhdGVzLnBuZykgIAoKCiFbKkZpbmFsIGRhdGFzZXQgaW4gRXhjZWwqXSguLi9zY3JlZW5zaG90cy8wNV9yZWFkeV9kYXRhX2V4Y2VsLnBuZykKCiMjIERhdGEgUHJlcGFyYXRpb24KClRoZSB0YWJsZSB3aXRoIHRoZSByZXF1aXJlZCBkYXRhIHdhcyBzYXZlZCB0byBhIENTViBmaWxlLiBUaGVuLCBpbiAqKlZTQ29kZSoqLCBJIHJlcGxhY2VkIGNvbW1hcyBgLGAgd2l0aCBkb3RzIGAuYCAoYDA2X2NoYW5nZV9jb21tYV90b19kb3RgKSwgYmVjYXVzZSBFeGNlbCBpbiBteSByZWdpb24gc2F2ZXMgbnVtYmVycyB3aXRoIGEgY29tbWEgYXMgdGhlIGRlY2ltYWwgc2VwYXJhdG9yLiAgCgohWypTZXBhcmF0b3IgcmVwbGFjZW1lbnQqXSguLi9zY3JlZW5zaG90cy8wNl9jaGFuZ2VfY29tbWFfdG9fZG90LnBuZykKCgojIyBMaW5lYXIgSW50ZXJwb2xhdGlvbiBvZiBNaXNzaW5nIFZhbHVlcwoKVGhlIG5leHQgc3RlcCBpcyB0byBwZXJmb3JtICoqbGluZWFyIGludGVycG9sYXRpb24qKiBvZiB0aGUgbWlzc2luZyBkYXRhLiAgCgpJIGxvYWQgdGhlIGRhdGEgaW50byBSIGFuZCBzcGxpdCBpdCBpbnRvIHRoZSBhcHByb3ByaWF0ZSBjb2x1bW5zOgoKYGBge3J9CmRhdGEgPC0gcmVhZC5jc3YoCiAgZmlsZSA9ICJkYXRhLmNzdiIsCiAgaGVhZGVyID0gVFJVRSwKICBzZXAgPSAiOyIKKQpgYGAKCmBgYHtyfQpkYXRhCmBgYAoKYGBge3J9CmRhdGFfZGF0ZXMgPC0gZGF0YSRkYXRlCmRhdGFfYWxpb3IgPC0gZGF0YSRjbG9zZV9hbHIKZGF0YV9pbmcgPC0gZGF0YSRjbG9zZV9pbmcKYGBgCgpJIHVzZSB0aGUgYGFwcHJveGAgZnVuY3Rpb24gaW4gUiBmb3IgbGluZWFyIGludGVycG9sYXRpb24uCgpUaGUgYGFwcHJveGAgZnVuY3Rpb24gd29ya3MgYXMgZm9sbG93czoKCi0gSXQgdGFrZXMgYSB2ZWN0b3Igb2Yga25vd24gcG9pbnRzIChgeGAgYW5kIGB5YCkuICAKCi0gSXQgY29tcHV0ZXMgdmFsdWVzIGF0IHRhcmdldCBwb2ludHMgKGB4b3V0YCkgdGhyb3VnaCAqKmxpbmVhciBpbnRlcnBvbGF0aW9uKiosIGkuZS4sIGl0IGRldGVybWluZXMgYSBwb2ludCBvbiB0aGUgbGluZSBjb25uZWN0aW5nIHR3byBuZWlnaGJvcmluZyBrbm93biBwb2ludHMuICAKCi0gVGhlIHBhcmFtZXRlciBgcnVsZSA9IDJgIGVuc3VyZXMgdGhhdCB2YWx1ZXMgb3V0c2lkZSB0aGUgcmFuZ2Ugb2Yga25vd24gcG9pbnRzIGFyZSAqKmV4dHJhcG9sYXRlZCoqIGluc3RlYWQgb2YgcmV0dXJuaW5nIE5BLiAgCgpCZWxvdyBpcyBteSBjdXN0b20gZnVuY3Rpb24gdGhhdCBmaWxscyBpbiB0aGUgbWlzc2luZyB2YWx1ZXM6CgoKYGBge3J9CmxpbmVhcl9pbnRlcnBvbGF0aW9uIDwtIGZ1bmN0aW9uKGRhdGEpIHsKICBuYV9pbmRleGVzIDwtIGMoKQogIG5vX25hX2luZGV4ZXMgPC0gYygpCiAgbm9fbmFfdmFsdWVzIDwtIGMoKQogIGZvciAoaSBpbiAxOmxlbmd0aChkYXRhKSkgewogICAgaWYgKGlzLm5hKGRhdGFbaV0pKSB7CiAgICAgIG5hX2luZGV4ZXMgPC0gYyhuYV9pbmRleGVzLCBpKQogICAgfSBlbHNlIHsKICAgICAgbm9fbmFfaW5kZXhlcyA8LSBjKG5vX25hX2luZGV4ZXMsIGkpCiAgICAgIG5vX25hX3ZhbHVlcyA8LSBjKG5vX25hX3ZhbHVlcywgZGF0YVtpXSkKICAgIH0KICB9CiAgCiAgaW50ZXJwb2xhdGVkX2RhdGEgPC0gYXBwcm94KAogICAgeCA9IG5vX25hX2luZGV4ZXMsCiAgICB5ID0gbm9fbmFfdmFsdWVzLAogICAgeG91dCA9IG5hX2luZGV4ZXMsCiAgICBydWxlID0gMgogICkKICAKICByZXNwb25zZSA8LSBkYXRhCiAgcmVzcG9uc2VbbmFfaW5kZXhlc10gPC0gaW50ZXJwb2xhdGVkX2RhdGEkeQogIAogIHJldHVybihyZXNwb25zZSkKfQpgYGAKCkkgcGVyZm9ybSB0aGUgaW50ZXJwb2xhdGlvbiBzZXBhcmF0ZWx5IGZvciBlYWNoIGNvbHVtbjoKCmBgYHtyfQpkYXRhX2FsaW9yX2ludCA8LSBsaW5lYXJfaW50ZXJwb2xhdGlvbihkYXRhX2FsaW9yKQpkYXRhX2luZ19pbnQgPC0gbGluZWFyX2ludGVycG9sYXRpb24oZGF0YV9pbmcpCmBgYAoKIyMjIFNhdmluZyB0aGUgaW50ZXJwb2xhdGVkIGRhdGEKCkkgc2F2ZSB0aGUgaW50ZXJwb2xhdGVkIGRhdGEgaW50byBhIENTViBmaWxlIHNvIHRoYXQgaXQgY2FuIGJlIHVzZWQgZm9yIGZ1cnRoZXIgYW5hbHlzaXM6CgpgYGB7cn0KbmV3X2RhdGEgPC0gZGF0YS5mcmFtZSgKICBkYXRlID0gZGF0YV9kYXRlcywKICBjbG9zZXNfYWxpb3IgPSBkYXRhX2FsaW9yX2ludCwKICBjbG9zZXNfaW5nID0gZGF0YV9pbmdfaW50CikKCndyaXRlLmNzdigKICB4ID0gbmV3X2RhdGEsCiAgZmlsZSA9ICJpbnRfZGF0YS5jc3YiLAogIHJvdy5uYW1lcyA9IEZBTFNFCikKYGBgCgpgYGB7cn0KbmV3X2RhdGEKYGBgCgpUaGFua3MgdG8gdGhpcyBzdGVwLCB0aGUgZGF0YXNldCBpcyBub3cgY29tcGxldGUsIGFuZCB0aGUgbWlzc2luZyB2YWx1ZXMgaGF2ZSBiZWVuIGZpbGxlZCB1c2luZyBsaW5lYXIgaW50ZXJwb2xhdGlvbi4gIApUaGlzIGVuc3VyZXMgdGhhdCB0aGUgZGF0YSBpcyByZWFkeSBmb3IgZnVydGhlciBhbmFseXNpcyBvZiB0aGUgY2xvc2luZyBwcmljZSBkaXN0cmlidXRpb25zLgo=